lElect ronic Journal of Statistics 



ISSN: 1935-7524 



Exponential bounds for minimum 
contrast estimators 



Yuri Golubev 

Universite de Provence 
39, rue F. Joliot- Curie 
13453 Marseille, France 
e-mail: [golubev ■ yurlOgmall . coin] 

Vladimir Spokoiny 

Weierstrass- Institute and 
Humboldt University Berlin, 
Mohrenstr. 39, 10117 Berlin, Germany 
e-mail: jspokoinyBwias-berlin . de | 

Abstract: The paper focuses on general properties of parametric mini- 
mum contrast estimators. The quality of estimation is measured in terms 
of the rate function related to the contrast, thus allowing to derive ex- 
ponential risk bounds invariant with respect to the detailed probabilistic 
structure of the model. This approach works well for small or moderate 
samples and covers the case of a misspocified parametric model. Another 
important feature of the presented bounds is that they may be used in the 
case when the parametric set is unbounded and non-compact. These bounds 
do not rely on the entropy or covering numbers and can be easily computed. 
The most important statistical fact resulting from the exponential bonds is 
a concentration inequality which claims that minimum contrast estimators 
concentrate with a large probability on the level set of the rate function. In 
typical situations, every such set is a root-n neighborhood of the parameter 
of interest. We also show that the obtained bounds can help for bounding 
the estimation risk, constructing confidence sets for the underlying param- 
eters. Our general results are illustrated for the case of an i.i.d. sample. We 
also consider several popular examples including least absolute deviation 
estimation and the problem of estimating the location of a change point. 
What we obtain in these examples slightly differs from the usual asymp- 
totic results presented in statistical literature. This difference is due to the 
unboundness of the parameter set and a possible model misspecification. 
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1. Introduction 

One of the most fundamental ideas in statistics is to describe an unknown dis- 
tribution IP of the observed data Y E J?" with the help of a simple parametric 
family {IP0, 6 E 0) , where 6* is a subset in a finite dimensional space, say, in 
IW . In this situation, the statistical model is characterized by the value of the 



1 

Imsart-ejs ver. 2008/08/29 file: ejs_2009_352.tex date: January 6, 2009 



golubev, yu. and spokoiny, v. /exponential bounds for minimum contrast estimators 2 



parameter <E and the statistical inference about P is reduced to recover- 
ing 6 . The standard likelihood approach suggests to estimate 6 by maximizing 
the corresponding likelihood function. The maximum likelihood estimator can 
be generalized in several ways r esulti ng in t h e so-c alled minimum contrast and 
M-estimators; see iHuben (|l967l ) and iHuben (|l981l l. The main idea behind this 
generalization is to estimate the underlying parameter 9 by minimizing over 
a contrast function ^L{Y, 6) : 



9 = argminj— 9)} = argmaxL(l^, 9) 
eee eee 



(1.1) 



The negative sign in this notation comes from the main example which we have 
in mind when L{Y , 9) is the log-likelihood or quasi log-likelihood. A natural 
condition on the contrast function is that its expectation under the true measure 
Peg is minimized at the true parameter 9q , i.e. 



9q — aig\-a&:^IEegL{Y ,9). 
eee 



(1.2) 



If L{Y,9) is log-likelihood ratio, that is, 



L{Y,9) 



dPoo 



then the value —EggL{9,9Q) coincides with the KuUback-Leibler divergence 
^{PegjPg) between and iPg . It is well known that 'K.{Pgg,Pg) is always 
non- negative and X{Pgg,Pg) = if and only if Pg^ = Pg . 

If the distribution P does not belong to the parametric family {Pg , 9 G 0) , 
then the target of estimation can be naturally defined as the point of minimum 
of —EL{Y,9). We will see that this point 9q indeed minimizes a special 
distance between the underlying measure P and the measures Pg from the 
given parametric family. 

The classical parametric statistical theory focuses mostly on asymptotic prop- 
erties of the difference between 9 and the true value 9o as the sample size 
n tends to infinity. There is a vast li t eratur e on this issue. We only mention 
the book llbragimov and Khas'minskiil (|1981[ ) , which provides a comprehensive 
study of asymptotic properties of maximum likelihood and Bayesian estima- 
tors. Typical results claim that the maximum likelihood and Bayes estimators 
are asymptotically optimal under certain regularity conditio ns. Large deviation 
result s ab out minimum contrast est i mator s can be found in IJensen and Wood 
( 1998f) and lSieders and Dzhaparidzd IjldSlk wh i le sub tle s mall sample size prop- 
erties of these estimators are presented in [Field (1982) and Field and Ronchetti 
( 199d) . 

Another stream of the literature considers minimum contrast estimators in a 
general i.i.d. situation, when th e parameter s e t i s a subset of some functiona l 
space. We mention the pa p ers Vaii de Geer (199^, Birge and Massart ( 1993[ ). 
iBirge and Massarl ( 1998[ ). Birgel (|2006[ ) and references therein. The studies 
mostly focused on the concentration properties of the maximum max^ L{Y , 9) 
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rather on the properties of the estimator 6 which is the point of maximum 
of L{Y, 6) . The estabhshed results are based on deep probabilistic fa cts from 
the empirical process theory; see e.g. Ivan der Vaart and Wellned (Il996h . In this 



paper we also focus on the properties of the maximum of L{Y , 6) over 6 G . 
However, we do not assume any particular structure of the contrast. Our basic 
result claims that if for every 6 G the differences L{Y, 9) — L{Y, Oq) has 
exponential moments, then under rather general and mild conditions, the max- 
imum maxe{L(y, 6) — LlY, Oq)} has similar exponential moments. In what 
follows, to keep notation shorter, we omit the argument Y in the contrast 
function L{Y, 9) writing L{9) instead of L{Y, 0) . However, one has to keep 
in mind that L{0) is a random field that depends on the observed data Y . We 
also denote 

L{e,eo) = L{e)^L{ea). 

To explain the main idea in this paper, introduce the function 

9JI(m,0,0o) -logiBexp{AiL(0,0o)}- 
Let ^* be a maximizer of this function w.r.t. /i , i.e. 

^i*{e) =^ argmaxmi(Ai, 0, 6>o). (1.3) 

The rate function is defined via the Legendre transform of L{6, Oq) : 

m*{9,eo)'^= maxm{^i,e,9o) =- log IE exp{fi* (9) L{9, Bo)]. (1.4) 



Similar notions have already appeared in Chernofi (|l952f ) and Bahadur ( 1960l ) 



for studying the models with i.i.d. observations. 

Obviously 9Jl*(6>,0o) > 371(0, 0,0o) = 0. The following identity follows im- 
mediately from the above definition: 

]Ecxp\^fi*{e)L{9,9o)+Tl*{9,9o)} = 1, ee0. 

We aim to extend this pointwise identity to the supremurn over 6 £ , which 
particularly enables us to replace 9 with the estimator 9 . Unfortunately, in 
some situations, JBexpsupg {n*{9)L{9,9o) +^m*{9,9o)} = cx) . We illustrate 
this fact by some examples for a simple Gaussian liner model. 



1.1. Examples for a linear Gaussian model 

To illustrate how the quantities ij,*{9) and Tl*{9,9o) can be computed let us 
consider the simplest case where L{9, 9q) is a Gaussian field. 

Example 1.1. [Gaussian contrast] Let for each pair 9,9' e 6*, the difference 
L{9, 9') = L{9)—L{9') is a Gaussian random variable. In this case we call L{9) 
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a Gaussian contrast. With M{e,e') = -1EL{9,9'), D^{e,e') = Var L(6>, 6>') , 
the random variable 1(6, 6') is normal '}<{-M{e,e'),D'^{0,e')) . Moreover, 

m{^i, 9, Bo) = - logEc^p{fiL{e, 0o)} = f^M{e, Bo) - ^i^D^{e, 6>o)/2 

and the values IJ.*{9), 60) defined in ()1.3|) - ()1.4|) can be easily computed: 

]\f(a a \ 

fi*{e) = argmax{/xM(6>,6>o) -m'-D2(6>,6>o)/2} - 



The formula can be further simplified if L{9) is a Gaussian log-likelihood. 
Example 1.2. [Gaussian model] Let 

be a Gaussian random variable for any 6 Cz , and in addition ]P = IPe^ for 
some 9o G . As in previous example, let M{9, 9q) and D{9, 9q) denote mean 
and variance of L{9, 9o) . The likelihood property implies Egg exp{L{9, 9o)} = 
1 yielding M{9,9o) = D^{9,9o)/2 and hence, = 1/2 and 97l*(0,0o) = 

M(0,0o)/4. 

Finally we consider a classical linear Gaussian regression. 

Example 1.3. [Linear Gaussian model] Consider the linear model Y = X9q + 
ae , where Y G JR" , 9 G IRp , X is a known n x p matrix, and e is a white 
Gaussian noise in JR" , i.e. Ei are i.i.d. standard normal. Then 

Li9) = -\\Y-X9\\l/i2a'), 

where || • ||„ denotes the standard Euclidian norm in _ZR" . Obviously 

M{9, 9o) = \\X{9 - 9o)\\l/{2a^), D{9, 9,) = \\X{9 - 0o)||'/^', 

and thus (see Example II. 2p 

m*{9,9o) = \\Xi9-9o)\\l/i8a^). 

The log-likelihood ratio can be written as 

L{9, 9o) ^{X{9- 9o),e),J<j - \\X{9 - 9o)\\l/{2a^). 

Let k denote the rank of the matrix X^ X . Obviously k < p and the vectors 
X{9 — 9q) span a linear subspace X in 5?" of dimension k. Denote by 77 
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the projector in on X . Then 

sup {^l*{e)L{e,ea) + m*ie,eo)} 

f {x{9~e,),e)n \\x{e-9o)\\l >^ 

A 2a 8cr2 J 



sup 



sup 

ue 

sup 



A 2a 8a2 / 
A 2^ ^l-\\ne\U2, 



where the maximum is attained at any u G such that Uu = 2ane . It 
is well known that ||77e||^j foUows - distribution with k degree of freedom 
and 

iE:eoCxpsup{^*(6»)L(0,0o) +9?l*(6/,6>o)} = iE;cxp{||i7£||2/2} = <x>. 
e 

However, for any positive s < 1 , it holds by the same argument that 
supIm* {e)L{0, Bo) + sTl* {6, Bo)} 

9 

= sup {{nu,e)^/{2a)~{2~s)\\nu\\l/{8a^)} = \\n£\\l/{A-2s), 



and thus 

\^£\\l^ /2-.s\fc/2 





JEe„cxpsup{M*(0)L(0,0o) + 5Q7l*(0,0o)} = JBcxpI^-^I = 



An important feature of this inequahty is that it only involves the effective 
dimension k of the parameter space and does not depend on the design X , 
noise level a^ , sample size n , etc. Later we show that such a behaviour of the 
log-likelihood is not restricted to Gaussian linear models and it can be proved 
for a quite general statistical set-up. 

1.2. Main result 

The examples from Section 11.11 suggest to consider in the general situation the 
maximum of the random field fi*{6)L{9, 9o) + sTl*{9, Oq) for s < 1 . The main 
result of the paper shows that under some technical conditions this maximum 
is indeed stochastically bounded in a rather strong sense. Namely, for some 

pe (0,1) 

E sup expL[^l*{e)L{e, 6>o) + sm*i9, Bo)] I < C{p, s), (1.5) 

where C{p,s) is a constant that can be easily controlled in typical examples. 
This result particularly yields that p*{9)L{0,Oo) and TI*{9,6q) have bounded 
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exponential moments. Another corollary of this fact is that 9 concentrates on 
the sets ^1(3, Oq) = {9 : Tl*{9, 9o) < 3} for sufficiently large 3 in the sense that 
the probability JP(0 ^ A{}, 0o)) is exponentially small in 3 . Usually every such 
concentration set is a root-n vicinity of the point 0n • See Section [2.3l for precise 
formulations. Ilbragimov and Khas'minskij ( 1981 ) stated a version of ()1.5p for 



the i.i.d. case and used it to prove consistency of 9 . 

We briefly comment on some useful features of the basic inequality (|1.5|1 . First 
of all this bound is non-asymptotic and may be used even if the sample size is 
small or moderate. It is also applicable in the situation when the parametric 
modeling assumption is misspecified. Our results may be used in such cases as 
well with the "true" parameter 9q defined as the maximum point of the contrast 
expected value: 9q = argmaxg EL{9) . 

Another interesting question is about the accuracy of estimation when the 
parameter set is not compact. The typical results in the classical parametric 
theory has been established for compact parametric sets since this assumption 
simplifies considerably the conditions and the tec hnical tools. There exist very 



few re sults for the case of non-compact sets. See Ilbragimov and Khas'minskij 



(|l98lh for an example. Our conditions arc quite mild and particularly, the pa- 
rameter set can be non-compact and unbounded. Moreover, we present some 
examples in Section [3] illustrating that the quality of the minimum contrast 
estimation can heavily depend on topological properties of and on the be- 
havior of the rate function 9Jl*(0, 0o) for large 9 . The corresponding accuracy 
of estimation can be different from the classical root- n behavior. 

The paper is organized as follows. The main result is presented in Section [2l 
Section 12.31 presents some useful corollaries of p.5p describing concentration 
properties of 9 , some risk bounds, confidence sets for the target parameter 9o 
based on the L{9,9) . Section [2^ specifies the approach to the important case 
of a smooth contrast. In this situation the main conditions ensuring (|1.5p are 
substantially simplified. Section [3] illustrates how our approach applies to the 
classical i.i.d. case while Section |4] presents some applications of the general 
exponential bound to three particular problems: estimation of the median, of 
the scale parameter of an exponential model and of the change point location. 
Although these examples have already been studied, the proposed approach 
reveals some new features of the classical least squares and least absolute devi- 
ation estimators in the cases when the parametric assumption is misspecified or 
the parameter set is not compact. In the case of median estimation the result 
applies even if the observations do not have the first moment. The last example 
in this section considers the prominent change point problem. Wc particularly 
show that in the case when the size of the jump is completely unknown, the 
accuracy of estimation of its location differs from the well known parametric 
rate 1/n and it depends on the distance of the change point to the edge of the 
observation interval and involves an extra iterated-log factor. 
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2. Risk bound for the minimum contrast 

This section presents a general exponential bound on the minimum contrast 
value in a rather general set-up. Let —L{0), 0^0, be a random contrast 
function of a finite dimensional parameter 6 € d IR^ given on some prob- 
ability space {^2,^',1P) . We also assume that L{9) is separable random field 
and 1EL{6) exists for all 6 E . The minimum contrast estimator is defined 
as a minimizcr of ~L{0) and the target of estimation is the value Oq which 
minimizes the expectation —EL{9) . It is clear that for any 9° £ 



Our study focuses on the value of maximum in 9 of the random field L{9, 9q) : 



By definition, L{9, 9q) is a non-negative random variable. 

2.1. Preliminaries. The case of a discrete parameter set 

The main goal of this paper is to obtain exponential bounds for the supremum 
in 9 of the random field L{9, 9q) , without specifying a particular structure 
of the model or contrast function L{9) . Instead we impose some conditions of 
finite exponential moments for the increments L{9,9') = L{9) — L{9') . With 
S!Jl(/^, 0, 0o) = ^ ^og IE cxp{ fiL{9, 9o)} , the global exponential moment condi- 
tion reads as follows: 

{EG) For any 9 G the set T{9) = {pi E (0, oo) : m{fi,9,9o) < oo} is 
non-empty. 

Note that T{9) is an interval because VJl{fi, 9, 9q) < oo implies Tl{fi', 9, 9o) < 
oo for all ij' < n . Moreover, in the basic example of the log-likelihood contrast, 
it holds Tl{l,9,9o) = -\og]Ee„{dP0/dP0„) < for all 9 and the condition 
(EG) is fulfilled automatically with (0, 1] C T{9, 9a) . 

Under the condition (EG) the functions fi*{9) and Tl*{9,9o) from pTS]) - 
(|1.4p are non-trivial and correctly defined. Usually these functions can be easily 
evaluated in a small neighborhood of the target parameter 9q . However, it might 
be difficult to compute them for all 9 G . Therefore, in the sequel we proceed 
with another function fi{9) , which can be viewed as a rough approximation 
of fJ-*{9) . Section [4] provides some examples. So, let fi{9) be a given function 
taking values in T{9, 9o) . Define 



The most important requirement on pl{9) is that 9Jl(0, 9q) is positive and 
increases as 9 moves away from 9q . By definition, for any 9 E , 



9 = argmaxL(0, 9°) and 9o = a.Tgma.x]EL{9, 9°). 




L{9,9o) = snpL{9,9o) = sup {L{9) - L{9o)} . 



Tli9,9o) ='ima*(0),0,0o) = -logiEexp{/i(0)L(0,0o)}. 




(2.1) 
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This means that the random function iJ,{0)L{6, Oq) + DJl{9, Oq) has bomidcd 
exponential moments for every . We aim to derive a similar fact for the supre- 
mum of this function in 0^0. More precisely, we are interested in bounding 
the following value: 

s) = E sup e^p{p[^iie)L{e, e^) + sm{e, e^)] |, (2.2) 

where p, s G [0, 1] . 

We begin with a rough upper bound for a special case of a discrete parameter 
set. 

Theorem 2.1. Assume {EG) and let O he a discrete set. Then for any s < 1 

0(1, s) = JBsupexp{^t(0)L(0,6'o) + sOT(0,0o)} 
see 

< ^exp{-(l-,s)OT(0,0o)}. (2.3) 

Proof. Since IE c^p{fi{e)L{d,eo) + sOT(6>,0o)} = cxp{-(l - s)9Jl(0, ^o)} , we 
obviously have 

0(1, s)<J2 ^cxp{Ai(6>)L(6>, 6>o) + sm{e, 6»o)} = ^ cxp{-(l - s)aJt(6>, 6>o)}. 

eee eee 

□ 

Usually, the function VJl{6, Oq) rapidly grows as 9 moves away from 6o . 
This property is often sufficient to bound the sum in the right hand-side of (|2.3p 
by a fixed constant. 

Although Theorem 12.11 is a rather simple corollary of (|2.ip . the bound (|2.3p 
yields a number of useful statistical corollaries. Some of them are presented in 
Section 12.31 However, even in discrete case, this bound may be too rough (see 
the example in Section 231) • It is also clear that (|2.3p is useless in the continuous 
case. The next section demonstrates how the bound (|2.3p can be extended to 
the case of an arbitrary parameter set. 

2.2. The general exponential bound 

Here we aim to extend the exponential bound (|2.3p from the discrete case to the 
case of an arbitrary finite dimensional parameter set. We apply the standard 
approach which evaluates the supremum over the whole parameter set O via a 
weighted sum of local maxima. 
Define for any 9,6' E 

m m{L{e, 9o) - iEL{e, 0o)} , c{0. 0') m - ae')- 

Note that the dependence of C,{6,9') on Oq disappears if fi{6) = l^{6') ■ 
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Usually the local properties of the centered contrast difference ({9,6') are 
controlled by th e variance D^(9, 9') = Var C (^, 0') , which defines a semi-metric 
on O see, e.g. Ivan der Vaart and Wellned (|l996l ). However, in some cases, it 



is more convenient to deal with a slightly different metric which we denote by 
6{0,G') . This metric usually bounds the standard deviation D{6,9') from 
above. Sections 1 2 . 41 and [3l present some typical examples of constructing such a 
metric. Below in this section we assume that the metric ©(•, •) is given. Define 
for any point 9° E O and a radius e > the ball 

S(e,6>°) = {6 : 6(9, 9°) < e}. 

To control the local behavior of the process L{9) within any such ball 23(6, 9°) , 
we impose the following local exponential condition: 

{EL) There exist e > and A > such that for any 9° Cz O , i^o > , and 
A< A 

sup log]Ecxp{2X^{9,9')} < 2iy^X^, 
e,e'es(£,e°) 

where 

In fact, this condition only requires that every random increment ^(0, 6') has 
bounded exponential moment for some A > . Then Lemma 15.81 from the Ap- 
pendix implies the prescribed quadratic behavior in A for A < A . 

For a fixed 9° £ O and e' < e , by N{e',e,9°) we denote the local covering 
number defined as the minimal number of balls ^(e', •) required to cover the 
ball 23(6, 9°) . With this covering number we associate the local entropy 



e,0°)'5^''^2-'^logN(2-'=e,e,r). 



dof 

fc=l 



We begin with a local result which bounds the maximum of the process L{9) 
over a local ball 55 (e, 9°) . 

Theorem 2.2. Assume (EG) and (EL) with some e > , I'd > , and 
A > . Let also p < 1 be such that pe/(l ^ p) < A . Then for any 9° £ 

\og]E sup exp\p[pi9)Li9,9o)+mi9,9o)]} <^^^^ + {l- p)Qie,9°). 
ee'B{t,e°) ^ t - p 

The next theorem is the global bound which generalizes the upper bound 
from Theorem 12. II 

Theorem 2.3. Assume {EG) and {EL) for some X^v^je, and let 7r(-) he a 
a -finite measure on such that 

sup )^) J' < lyi (2.4) 
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for some v\ G [1, oo) . Let for some p, s < 1 , it holds /ce/(l — p) < X and the 
function dJl^{6° ,9o) = infeg2(e,e°) ^o) fulfill 

^,(p,s)''=^'log|^^-^^^^exp{-p(l-.s)OT,(0,0o)}^(rfe)^ <cx). (2.5) 

Let finally Q(e,0°) < Q(e) for all 9 e . Then the value Q{p,s) from fl^j 
satisfies 

s) < ^^t^ + (1 - p)Q(e) + log(z.i) + i3e(p, s). (2.6) 
1 - P 

As ill Theorein l2.11 proper growth conditions on the function 2Jl(0, Oq) ensure 
that the integral S)^{p, s) in (|2.6p is bounded by a fixed constant. 

2.3. Some corollaries 

This section demonstrates how Theorems I2.1H2.3I can be used in the statistical 
analysis of the minimum contrast estimator 6 = argmaxggQ L{6) . We show 
that probabilistic properties of this estimator may be easily derived from the 
following inequality: for prescribed p, s < 1 , 

Ec^p{p[pie)L{e, Oo) + sm{e, Oo)] } < Q(p, s), (2.7) 

which obviously follows from Theorem 12.31 and the definition (|2.2p of 0(p, s) . 
2.3.1. A risk hound for the "natural" loss 

A first corollary of Theorem 12. II presents exponential bounds separately for the 
minimum contrast L{6,6o) and for the "natural" loss 971(0, ^o)- 

Corollary 2.4. For any p,s < 1 

Ec^p^^pp(e)L(0,eo)} < Q(p,0), (2.8) 

iEcxp{psm(e,eo)} < n(p,s). (2.9) 

Substituting s = in (|2.7p yields the first bound. To prove the second one, 
notice that L{6,6o) > 0. Therefore the elementary inequality l{a; > 0} < 
exp(^a;) for any fi > yields (see also <\2.7^ ) 

Eexp{psmie,9o)} = iEcxp{psm{e,eo)}i{L{e,eo)>o} 

< iEcyip{psm{e,eo) + ptJ.{e)L{e,ea)} <£i{p,s). 

Notice the exponential bound (|2.9p implies a similar risk bound for a poly- 
nomial loss |9Jl(^, 0o)|'^ ; see Lemma [STTI for a precise result. 
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2.3.2. Concentration properties of the estimator 9 

The assertion ()2.7p can be used for establishing the concentration property of 
the estimator 9 . Consider the sets 

A{r,9o) = {9:m{9,9o)<r} 

for some r > . The next resuh shows that the estimator 9 leaves the set 
A{r, 9q) with the exponentially small probability of order exp(— psr) . 

Corollary 2.5. For any p,s < 1 , it holds 

P{9 ^ A{r, 6>o)) < 0(/9, s) exp(-psr). 
Proof. The inequalities l(9,9q) > and M{9,9o) > r for 9 ^ A{r,9Q) imply 
EeP'-'li^ ^ A{r, 6>o)) < ^exp|p[/x(g)L(0, 9o) + sm{9, 6>o)] } < s) 

and the assertion follows. □ 

In typical situations, 971(0, 9q) is proportional to the sample size n and each 
set yi(r, 0o) corresponds to a root-n neighborhood of the point • See the 
Section [3] for applications related to the i.i.d. case. 

2.3.3. Confidence sets based on L{9,9) 

Next we discuss how the exponential bound (|2.7p can be used for constructing 
the confidence sets for the target 9q based on the optimized contrast L{9, 9) . 
The inequality (|2.8p claims that L{9, 9q) is stochastically bounded. This justi- 
fies the following construction of confidence sets: 

^{i) = {9e0:L{9,9)<i}. 

To evaluate the covering probability, consider first the case when > /x* > 
uniformly in 9 £ . The next result claims that £(3) does not cover the true 
value 9q with a probability which decreases exponentially with 3 . 

Corollary 2.6. Assume that pL(9) > > . Then for any 3 > and any 

p<l 

P{0o i £(3)) < 0(p,O)exp{-p^,3}. 
Proof. The bound (|2.8p implies 

]P{eo^£{})) = P{L{9,9o)>i) 

< ]Eexp{-pn(d)}}exp{pfi(d)L(9,9o)} 

< cxp{-pfi^:^}]Ecxp{pfj,(9)L{9,9o)} 

< O(p,0)exp{-p/i»3} 

as required. □ 
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In the case when the function cannot be uniformly bounded from be- 

low by a positive constant, we assume that such a bound exists for every set 
A{r, Oq) . Denote 

u*(r) inf u(9)- 

0(iA(r,eo) 

Then 

IP{e,, i £(3)) < iP(0o i £(3), e A{t, Bo)) + IP{e i A{r, Oo)) 
and combining Corollaries 1 2 . 5ff2?6l vields 

Corollary 2.7. For any 3 > and any p, s < 1 and any r > 

lP{do i £(3)) < 0(/5,O)exp{-p/^,(r)3} +0(/3,s)exp{-psr}. 

A reasonable choice of r in this bound is given by the balance relation 
A** — ■^'^ • With this choice the bound of Corollary 12.61 may by replaced 

by 

P{On i £(3)) < 2Q(p,s)exp{-p/i*W3}- 
2.4- Exponential bounds for smooth contrasts 

This section deals with the case when the contrast L{9) is a smooth function 
of . In this situation, the local condition {EL) is easy to verify. Moreover, the 
local balls 23(6, 0) nearly coincide with usual Euclidean ellipsoids and the local 
entropy can be easily bounded by an absolute constant only depending on the 
dimensionality p of the parameter space . 

Suppose 6* is a convex set in IR^ and the function L{6) along with the 
scaling factor are differentiable w.r.t. 6 . Below, the symbol V stands for 

the gradient w.r.t. 9 . 

Define 

Vie) '^^^ ^vc(6»)[vc(6»)]'^, 

HiX,,,9) lo,Ee.J2X^g^]. 

for every unit vector 7 S IR^ . To simplify the presentation, here and in what 
follows we assume that every matrix V{9) is non-degenerated. It is easy to see 
that H{0, 7, 6») = , dH{0, 7, 9)/dX = , and 



d^H{X,-f,e) 



92 A 



A=0 



47^jEVC(g)[VC(e)]^7 ^ 

i^v{e)j 



Therefore for small A H{X,-y,9) « 2X^ . Below we assume that this property 
is fulfilled uniformly in 9 Cz and in 7 over the unit sphere in IR^ . 
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{ED) There exists A > such that for some z^o > 1 uniformly in 6 Cz 

sup sup A"2^(A,7,6>) < 2i/^. (2.10) 

|A|<A 7esp 

Now we define the metric &{6, O') by 

e"(0,0')=^ snv {e-e'Yv[{i-t)e' + te\{e-e'). (2.11) 

tG[0,l] 

Define also for every 9° £ and e > the ellipsoid H'{e,6°) by 

s'(e, e°) = \e:{e- e°Yv{e°) {e - e°) < e^j. 

Obviously ^(e,^) C ^'(e,^) . 

In what follows, we assume that the radius e can be chosen in such a way 
that the functions V{9) and dJl{9, 9q) have bounded fluctuations within the 
ball 23'(e, 9°) for every 9° £ . More precisely, for a given function /(•) define 
its magnitude over 'B'{e,6°) by 

2le/(0 ) = sup 



e,e'eS'(c,e°) /(^ ) 

Similarly, the magnitude of the matrix V{6) over 'B'{e,6°) is computed as 
follows 

aeV^(6» ) = sup sup ^ , ■ 

e,6»'eS'(e,e°) 76S'p 7 ' v/(,y j7 

Notice that under the condition 2tey(-) < i^i , the topology induced by the 
metric &{■,■) is (locally) equivalent to the Euclidean topology and the set 
'B{e,9°) can be well approximated by the ellipsoid 'B'{e,9°) and computing 
the local entropy Q(e, •) can be reduced to the Euclidean case; see Lemma [nH] 
for more detail. 

Now we are ready to state an exponential bound for the contrast process in 
the smooth case. 

Theorem 2.8. Assume that (EG) and {ED) hold true with some vq and 
A > . Suppose that there is a constant e > such that ep/{l — p) < A and for 
a fixed vi>\ and each 9 £ , it holds 

21^^(0) < vi. (2.12) 

Let for some p,s <l the function dyi^{9°,9o) = infgg3(-£ qo-j 9Jt(0, ^g) fulfill 

Sj,{p,s) = log (^uj-^e-P Vdetl/(0)exp{-p(l - s)9}l,(0, ©o)}^^) < 00, 
where Up is the Lebesgue measure of the unit ball in IRP . Then it holds 
n(p,s) < (l-p)Qp + ^^^^^ + 2plog(i/i)+i3,(p,s). 
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Remark 2.1. The conditions of this theorem are very mild. (EG) only requires 
that L(6, 6q) has exponential moments. (ED) requires a similar condition 
for the centered and normalized gradient VL{0) . The inequalities (|2.12[) are 
equivalent to uniform continuity of the function V{0) . 

Remark 2.2. The presented exponential bound requires that the value ^c{p, s) 
is finite. Fortunately it can be easily checked in typical situations. A typical 
example is given in Section [3] which deals with the i.i.d. case. 

2.4-1- A risk bound for 6 — Oq 

Our main result controls the risk of the minimum contrast estimator in terms 
of the rate function 9Jl(0, 6q) . In the case of the smooth contrast, this result 
may be used to bound the classical estimation loss 6 — 6q. The idea is to bound 
from the rate function 271(0, Oq) by a quadratic function in a vicinity of the 
point Oq and next to make use of the concentration property of . 

Note that for any ^ , it obviously holds 97l(^, 0^, Oq) = and a simple algebra 
yields for the gradient of 9Jl(/i, 6q, 6o) 

= -p.lE^L(e)\^^^^^ = -iNlEL{Q^) = 0. 

So, 37l(/i, ^o) can be majoratcd from below and from above in a vicinity 
of Bq by the Taylor expansion of the second order. The same behavior can be 
expected for the optimized rate function OT(0o,0o)- This argument and the 
concentration property from CoroUarv 1 2 . 51 lead to the following result: 

Corollary 2.9. Suppose the conditions of Theorem \ 2.8\ are satisfied and also 
for some r > , the function 97i(0, 6q) fulfills 

971(0, 0o) > (0 - 0o)^K,(0 - 0o), e A{r, 0o), 

for some positive matrix Vq . Then for any p, s < 1 and 3 > 

iP(||v^(0-0o)||2 >3) < Q(p,s)exp{-psmin{3,r}}. 

Proof. It is obvious that 

{WV^ie - OoW > i} C {||v/K^(0-0o)f >3,0e.A(r,0o)}u{0^yi(r,0o)} 

C {m{e, 0o) > 3, e A{r, 0o)} U {0 ^ A{r, 0o)} 
= {0^yi(r A3,0o)} 

and the result follows from CoroUarv 12.71 □ 

In the case of i.i.d. observations, the function 37l(/x, 0, 0o) and hence the 
matrix Vq are proportional to the sample size n and the result of Corollarv l2.9l 
automatically yields the root-n consistency of ; see Section [3] for more details. 
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3. Quasi MLE for i.i.d. data 

Let Y = {Yi, . . . ,Yn) be an i.i.d. sample from a distribution P. By IP we 
denote the joint distribution of Y . Let also J" = {Pg, 6 E C M^) be a 
parametric family. In contrast to the standard parametric hypothesis which 
assumes that P e T , in this section, we focus on the quality of estimation 
in the case when the underlying measure P does not necessarily belong to 
the parametric family J" . We will see that in this case the maximum likelihood 
method estimates the point 9q , which minimizes some special distance between 
P and Pe over 6 € . 

In the rest of this section, the family T and the underlying measure P are 
assumed to be dominated by a measure Pq . We denote by p{y, 9) and p{y) 
the corresponding densities: p{y,0) = dPg/dPo{y) , p{y) = dP/dPQ{y) . The 
maximum likelihood estimator 6 of the underlying parameter 6q is computed 
as follows: 

n 

= argmaxL(0) = argmax\^ 9), 

where i{Y, 9) = logp{Y, 9) . Denote 1{Y, 9, 9') = i{Y, 9) - 1{Y, 9') and 

m{li,9,9o) ^ ~\ogEcM^^l{Y,9,9Q)}, 

The i.i.d. structure of the observations Y implies that 

m{|l,9,9a)=nm{^l,9,9o). 

This enables us to redefine the function ji* [9] in terms of the function m(-, 0, 0o) 
corresponding to the marginal distribution P : 

l-i*{9) = a,rgma,xm{fi,9,9o) 

and fi{9) can be interpreted as an approximation of fJ.*{9) . Denote also 

m{9,9o)=m{fi{9),9,9o), 
and for Cii^) = KS)WYi,9,9a) - Ee{Yi,9,9a)} define 
v{9) = E\/Ci{9)[\/Ci{9)V, 
h{5,r,0) = log-Eexp(2J ^ , ^Ci{0) \ 

Notice that if P coincides with Pg^ and fi{9) is constant in a vicinity of 9o , 
then v{9o) is the standard Fisher information matrix. One can easily check that 



Mo,7;^) = 0, 



dS 



0, 



= 4. 

(5=0 
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It follows from Lcmma lSTSl that for any i^o > 1 and 6 E there exists d{9,uo) > 
such that h{S,^;6) < 2vqS^ for all j e and S < 6{9,vo)- We assume a 
slightly stronger condition that S{6) can be taken the same for all 6 , i.e. 

sup sup /i((S,7;6>) < 2l/^(5^ 5 <6. (3.1) 
fee ^^S" 

In some cases, the matrix v{9) should be replaced by its rcgularization v{9) 
to ensure this property, see Section [4.21 for an example. 

Independence of the Yi 's implies that V{0) Cov{VC(0)} ~ nv{6) and 

H{X, 7, e) log E exp|2A4^|g= I = nh{n-''^X, 7; 0) 

^ V7'y(0)7-' 



for any A and any ^ £ . Therefore, if n ^/^A < S , then by p.ip : 

and the condition (ED) is fulfilled with A < n^^^S . Now one can easily refor- 
mulate Theorem 12.81 in terms of the marginal distribution P . 

Theorem 3.1. Assume i3.1\} for some S > and vq > I . Suppose that there 
are constants e > and vi > I such that for each 9 Cz 

2lew(6») < lyi. (3.2) 

Let also for some s, p < 1 such that ep/(l — p) < n^^^S 

i3,(p,s) =''log (^uj-h-P y^det{nw(6»)}exp{-p(l-s)7im,(6»,6»o)}d6»j < 00, 

where me(0,0o) = iiife'ecB(e.e) "^(^i ^0) • Then the value £l{p,s) from i2. S\) 
fulfills 

log0(p,s) < (l-p)Qp+ +2plog(^i)+^,(p,s). 

1 - P 

The integral in ^e(p, s) can be easily bounded in typical situations. The 
result presented below involves some conditions on the marginal rate function 
m(0, 60) ■ Namely, it is assumed that this function is bounded from below by 

a quadratic polynom in a vicinity yii(r, ©o) "^^^ • Tn(^i^o) ^ ^} of the point 
Oq for some fixed r > and it increases at least logarithmically with the norm 
\\9 — 9o\\ outside of this neighborhood. 

In particularly, it is shown in Section [5] that for n sufficiently large 
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2 

Theorem 3.2. Assume 13.1]) and let p fulfill p/{\ — p) < nS . Suppose that 
S3. S\) holds with e = — p)/ p . Let for some r > , there are a positive 
matrix vq and a constant Ur > such that 

Let for some P > , hold: 

CriP) '^^^ [ Jdct{v{e)} cxp{-/3me(0, 6»o)} d9 < oo. 

Finally, let n be sufficiently large to ensure 

br{n) =^ p(l - s)nr - f3r - a^^e - (p/2) logn < 0. (3.4) 

Then for some C depending on ar,vo,i^i, Cr{P) only, it holds 

logO(p, s)<Cp+^ log(|(l - p)(l - s)\-'), 

This bound together with Corollarv l2. 91 yields 

F{na^,\\vl^\e - OoW > } + pC{p,s)) < exp{-ps min{3, r^^}} 

with C{p,s) = C+log(|(l— s)|^^)/2 . This result means root-n consistency 
of 9 in a rather strong sense. 

4. Examples 

This section illustrates how the exponential bounds can be applied to some 
particular situations. To simplify technical details, we do not try to cover the 
most general case. Rather we aim to show that our basic conditions can be easily 
verified in typical situations. 

4-1- Estimation in the exponential model 

The exponential model assumes that the observations Y = {Yi, . . . ,Yn) are 
i.i.d. exponential random variables from the exponential law Pg with an un- 
known parameter 9 e ; PeiYi > y) ~ cxp{—dy) . In this example we focus 
on the classical parametric set-up assuming that the underlying measure IP 
coincides with the product of IPg^ for some 9q S IR^ . The corresponding max- 
imum likelihood contrast is given by 

n n 

m - J2 ^(^- o)^-eY,y^+n iog(0) 

i=l i=l 
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yielding 

_ / " 

e^n / ^Yi, L{0,e) = n\og{0/9) +n{0/e -1) = nX{0,9), 
' i=i 

where %{9,0') = 6' /0—\ — \og{0' /0) is the KuUback-Leibler divergence between 
the exponential laws Pq and Pgi . 
Define 

h^{5) ''=^'logJE;cxp{-5(0o^i - 1)}. 
Then, with u = 0/0o-l,it holds 

m{^i,0,0Q) =^ -logE^So exp{/i£(6', 6*0)} = ^i[u - log(l + u)] - /ii(^u). 
Therefore, with 

mj(u) = max{^[u — log(l + u)] — 

= argmax|/.i[u — log(l + u)] — 

the optimal choice of ijl{9) is given by I-l*{0) = ^i(u) leading to m*(6', (?o) = 
m*(u) for u = 9/9o — 1 . For applying Theorem 13. 11 we need a lower bound for 
Tn*(M) . Simple algebra yields for Yi ^ Exp{9o) 

hi{S) = (5-log(l + (5), m{fj.,9,9o) = log(l + /iu) - //log(l + it), 

so that 

u — log(l + u) 



= argmax{log(l + /iu) — /ilog(l + u)} = 



ulog(l + u) 



To simplify the calculations, we proceed further with the suboptimal choice 

m - M = 

m(ii) with 



fj.{9) = /i = 1/2 instead of fi*{0) = /^^(u) leading to m{0,0o) =^ m(/i, 6*, 6'o) 



m(u) =^ log(l + u/2) ~ 0.5 log(l + m) ^ log! 1 + 



2 4(1 + u) 

for u = 0/00 — 1 > —1 . It is easy to see that Tn(u) > ciu^ for |u| < 1 , and 
m(u) > C2 log(l + u) for u > 1 with some ci, C2 > . 
Next 

Ci{0) ^l{e{Yl,9) - m{Y^,9)} - -^0(^1 - i/^o), 

VCi(e) = -A^(i"i-l/eo) 

so that with = VarYi = l/9l it holds t;(6l) =^ iE;[VCi(6')]^ = /i^tr^ = 
1/(4^0^), 



logiE;exp{<5VCi(0)/V^} = h^{6), 
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and the condition (j3.ip is obviously satisfied with some i'q < oo . Similarly, the 
conditions (|5.4p through (|3.4p can be easily verified and Theorem 13.21 applied 
with s = yields 

]Ec^p{pL{0,0o)/2} = Ec^p{pn%{0,0o)/2} < (Y^^jiyi- (4-1) 

An important feature of this result is that it applies for the unbounded and 
non-compact parameter set (0, +oo) . Another corollary of (|4.ip is that the true 
parameter 0q is covered with a high probability by the confidence set £(3) of 
the form 

£(3) ^{0e 0:6/0-1- \og{0/0) < i/n} 
provided that 3 is sufficiently large. 



4.2. LAD contrast and median estimation 

Median or more generally quantile estimation is known to be more robust and 
stable against o utliers and it is frequen tly used in econometric studies; see 
Koenker (l2005h . lKoenker and Xiaol (l2006l) . 



Suppose we are given a sample Y = (Yi, ...,¥„)■ In the problem of median 
estimation, these random variables are assumed i.i.d. and we are interested in 
estimating the median 0q which is a root of the equation 

P{Yi < 0o) - P{Yi > Oo). 

Alternatively, the median minimizes the value E\Yi — 6\ provided that the 
expectation of \Yi\ is finite. This remark leads to the natural estimator 9 of 
the median as the minimizer of the contrast —L{6) = l^i ^ ^1 ■ 

n 

= argmaxL(6') = argmin > \Yi — d\. 

e 9 , 

1—1 

If the 's are i.i.d. with the Laplace density exp(— |y — 0o|)/2, then L{9) 
coincides (up to a constant factor) with the log-likelihood. In the general case, 
L{9) can be treated as a quasi log- likelihood contrast. Later we also briefly 
comment on the case when the Yi 's are not i.i.d. 

Assume first that Yi has the density pe{y) ~ p{y—9) where p(-) is a centrally 
symmetric function. To simplify the notation, we also assume that = . The 
general case can be reduced to this one by a simple change of variables. The 
density p{y) is supposed to be positive and for y > we define 

Xiy)^-i2y)-'log[2P{Yi>y)]. 

Equivalently, we can write P{Y > y) — €^^^^^^^2 for y > 0. The case with 
> Ao > corresponds to fight tails while X{y) ^ as \y\ 00 means 
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heavy tails of the distribution P . Below we focus on the most interesting case 
when X{y) is positive and monotonously decreases to zero in y > . For sim- 
plicity of presentation we also assume that X{y) is sufficiently regular and its 
first derivative A' (y) is uniformly continuous on IR . The assumption of heavy 
tails implies that [yX{y)]' E [0, 1] and hence, 

|yA'(y)| = |[yA(y)]'-A(2/)|<l. 

Let 

m{e) =^ E\Yi - 0\, q{9) =^ P{Yi < 9) - P{Yi > 6). 

Obviously m'{e) dm{e)/de = q{e) . It is also clear that \q{e)\ < 1 . Next, for 
> , it holds 

.,f /) /) N drf 9 . „ . Jo, y i [0,61], 
l{y.eM = Q-yWM = \^^^ otherwise, 

and 6*, ^?o) = —Q for y < . Therefore, integration by parts yields 

^gMf(FiAeo) = - J ef''^y-'-'°UPiYi>y) 

= e-^" + / /i^'(y, e, 0o)e^^(«^^^^°)p(yi > y) dy 



1-0 

= e-''" + 2^1 / e^(2y-'')p(Yi > y) dy 
Jo 

= e-''" + /ie-^^ I'e^y^^'-^^y^^ dy 
Jo 

and similarly for 9 < Oq . We now fix /i(6') = X{9) . Monotonicity of A(y) implies 



Therefore, for > , 

m(0,0o) > ex{e)-\og{i + ex{e)]. (4.2) 

The same low bound holds true for 6* < . For 0X{0) < 1 it obviously holds 

m(6i,6io) > 9^-X^{e)/2. 
Now we check the condition (|3.ip . Define 

Co{e) E,{\Y, -e\- \Y,\) {\Y, -e\- \y,\). 
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Then, for 6* > 



vCoW = i{Yi<9)-i{Yi>e)-qie), 

E\WCo{9)\^ = l-q^{9), 



p6 pQ pO 

Var / V(;o{9)d9<9 ]E\W Co{9)\^ d9 ^ 9 {l - q"^ (9)} d9 , 
Jo Jo Jo 



Var Co (0) 
and 

6'-2 VarCo(f) - {l - q^{9)}de 0, W ^ oo 





because q{9) 1 . Next, Ci(^) =^ H^KaiO) and 

VCi(0) - dCi{e)/d9 = X{9)WCo{0)+9X'{9)Co{9)/9. 

Note that |VCo(6')| < 1 and \Coid)/9\ < 1 , and in addition X{9) and 
Var(Co(^)/^) ^0 as 9 ^ oo , while |0A'(0)| remains bounded by one. This 
easily implies the condition (|3.ip for some fixed S > 0,vo > 1 , and v{9) = 1 . 
Moreover, if iE|Yi|''' < oo for some 7 > 0, then the onditions of Theorem 13.21 
are fulfilled. This theorem applied with p = s and Corollary 12.41 lead to the 
bound for the loss u= \9 — 9o\'- 

C 

JEexp{p2?i[MA(M) - log{l + uA(u)}] } < 



pl/2(l_p)l/2 

with some fixed constant C provided that n exceeds some minimal sample size 
no . 

The case of independent but non i.i.d. observations can be again reduced to 
the considered case using P = X]i=i ^^'^ defining the point 9o as a root 
of the equation 



i=l i=l 

4.3. Estimation of the location of a change point 

Suppose the observations Y = (Yi, . . . , y„) follow the change point model: 

Y, ^ Al{i < 9) + ai,, i = l,...,n, (4.3) 

where is a standard white Gaussian noise. Our goal is to estimate the change 
point location 9 e O = {l,...,n — 1}. The obtained results can be easily 
extended to the case of non-Gaussian errors under some exponential moment 
conditions. 
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We begin with the case when the amplitude A is known. To estimate 6 , we 
use the maximum hkehhood estimator 

Oa = aigmax La{0), 
Bee 

where the maximum hkehhood contrast is given by 

2—1 i—1 

Note that La{6) is a Gaussian random variable for every 9 with 



M{e,eo) = -iELA{e) = ^ ^ 



D^e,eo) = YayLAie)^^-^\e-eo\^2M{e,eo). 

This yields for any > 

9JI(m, 0, ^o) = mA/((?, ^o) - ^o)/2 - (Ai - [i^)M(B, ^o), 

and the corresponding values /i*(6'), 9Jl*((?, 6*0) can be easily computed: 

- 1/2, m*(eM - A/(^,^o)/4. 

Therefore, for p < 1 , Theorem 12. II implies 

JBexp{p2^|6^-0o|} < 5^exp{-^ii^A/(0,0o)} 



2^expi- 

fc=0 ^ 



< 2>exp^-^^ii^4M!H = 



8ct2 J 1 - C(p) 



where C(p) exp{-p(l - p) A^ j (^a"^)] . By LemmaO 

with some constant Ci(r) . 

Now we switch to the case when A > is an unknown parameter. In this 
case, we cannot use the contrast LA{d) because it strongly depends on A. 
To find a reasonable contrast, one can use the maximum likelihood principle. 
Considering ^ as a nuisance parameter and maximizing La{0) w.r.t. A > 
leads to the following estimator: 

1 r ^ 

9 — argmax-^ max La{0) > = argmax } Yi 

ft I A>o J a 2a-'0 ^-^ 
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where [a;]-|- = max(a;, 0) . In what follows we deal with a slightly modified version 
of this estimator 

1 " 

9 = argmaxL(0), with a new contrast L{d) = — r= Yi, 



which is again a Gaussian one. By the model equation (|4.3p . this contrast can 
be represented in the form: 



ylmin(6i,e'o) 



It is easy to see that the drift M{9,9q) = —]EL{9,9q) satisfies 

M{9,9q) = ad{e,eo) 

with a = (t~^A^/9o and 



d{9, 9')^1- y/mm{9/9',9'/9} = 
Similarly, 



0<9', 

1 - ^/¥J9, 9 > 9'. 



D'{9, 9') Var L{9, 9') = A^TT ^ = 2d(0, 9') 

{\/9 + vF)^max(e', 6*') 

and obviously, M{9,9q) = oI?2(6», 6*0) /2 . Also D^{9,9o) < 2 for all 9 . As L{9) 
is a Gaussian contrast, it holds 

see Example 11.11 Note that for every 9 G , the value dyi*{9,9o) is bounded 
by a^/8 = ^^6*0/(80-^) . So, this example is quite special in the sense that the 
KuUback-Leibler divergence between measures IPg^ and IPe does not grow to 
infinity with 9 . We will see that this fact results in an extra loglog-factor in the 
bound for the minimum contrast. 

For given e > and 9° & O , the local baU ^(e,^) = {D{9,9°) < e} can 
be represented in the form 

!B(e,6i°) = {9 : 9°{1 - e^/2f <9< 9°{1 - 

and it can be transformed into the usual symmetric interval around log 9° by 
using the parameter log 9 instead of 9 : 



•B 



(e, 9°) = [9 : |log0 - logr I < -21og(l - 6^2)}. 
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This immediately implies that the local entropy Q(e, 9°) is bomidcd by Q = 1 

for all e° ee. 

Let the measure 7r(-) assign the mass 1 to any point 9 = l,...,n. Then 
7r(!B(e, 0°)) is equal to the number n^{9) of points 9 in B{e,9°), and it 
obviously holds n,{9) « K{e)9 with K{e) = (1 - - (1 - > 

for e < 1 , so that ([2^ is fulfilled. Fix = 1/2 . The trivial lower bound 
971(6', 6*0) > yields for Sje{p,s) from (ES]) for any s < 1 : 

for some Ci > . This yields by Theorem 12.31 and its Corollarv l2.4l that 

Eexp{pa^d(9,9o)/8} < Cslogn. (4.4) 
Combining this with Lemma 15.71 viclds 
A^9o 



E 



d{9,9o) <C|loglognr 



The extra log log -factor in this boimd is due to the unbounded parameter 
set. In the case "classical" situation when the size A of the jump is bounded 
away from zero and infinity and the true "relative" location 9o/n is bounded 
away from the edge similar calculations (not presented here) lead to a bound 
E e xl>iC^ p^A^\9 — 9()\] < C2 which does not involve any extra log-term; see 



e.g. Csorgo and HorvathI (1997) and references therein for asymptotic versions 
of this result. 

It is also interesting to compare this result with the accuracy of the maximum 
likelihood method in the case, where the magnitude of jump A is known. One 
can see that there is a payment for the adaptation to the nuisance parameter 
A which is in form of an extra loglog -factor. Another observation is that the 
accuracy of estimation strongly depends on the true location 9q , more precisely, 
on the value = A^do/a'^ . In the "classical" situation this value is of order n 
leading to the accuracy of order n^^ log log(n) . If the value is smaller in order 
than n , then the accuracy becomes worse by the same factor. In particular, if 
A^Oo/a^ is of order one, then even consistency of 9 cannot be claimed. 

5. Proofs 

This section collects proofs of the main theorems and some auxiliary facts. 
5.1. Proof of Theorem [Ol 

Assume that 6° E . First we establish a local bound for the maximum of the 
process 1(9, 6'o) over the local ball !B(e, 6°) = {0 : 6(6», 9°) < e} . 

Imsart-ejs ver. 2008/08/29 file: ejs_2009_352.tex date: January 6, 2009 



golubev, yu. and spokoiny, v. /exponential bounds for minimum contrast estimators 25 

Proof. The main step of the proof is a bound for the stochastic component 
C(6>,0') over the ball 'B{e,e°) for a fixed 6^ e S(e,0°) . 

Lemma 5.1. Assume that C(^) is a separable process satisfying for any given 
0° E O the condition (EL) . Then for any given E 'B{e,9°) and any A with 
A/e< A 

logiEexpj- sup ((f,^*)} < + 2i^o^^- 



Proof. The proof is based on the standard chaining argument (see e.g. lvan der Vaart and Wellner 
(199a)). Without loss of generality, we assume that Q(e, 6°) < oo . Then for any 
integer fc > , there exists a 2~'^e-net Df^{e, 9°) in the local ball CB(e, 0°) hav- 
ing the cardinality N(2-'=e, e, 6°) . Using the nets Dk{e-, 0°) with fc = 1, . . . , 
1 , one can construct a chain connecting an arbitrary point in Dk{^, d°) and 
6^ . It means that one can find points 9^ G Dk{e, 9°), k ~ 1, . . . , /v — 1 , such 
that &{9k,9k-i) < 2-''+'^e for k = 1,...,K. Here 9k means 9 and 9o 
means 9^ . Notice that 9k can be constructed recurrently: 9k = Tk{9k+i), k = 
— 1 , . . . , 1 , where 

Tfc(0) = argmin &{9,9'). 
6i'eDfc(£,e°) 

It obviously holds for 9 G Dx(e, 0°) 

K 

C(0,0«) = ^C(0fe,0fc-i). 

fc=i 

For ^{9k,9k-i) ^ aek.ek-i)/e{9k,9k-i) it holds that 

C{9k,9k-i) = &{9k,9k^iM9k,9k^i) = 2eck^{9k,9k-i) 
with Ck=Cki9,9°) = e(6/fc,0fe_i)/(2e) < 2-^ and 

if 

sup C(0,e") < V sup C(^',Tfc_l(0')) 

= 2eV sup Cka9',Tk-i{9')). 
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Since Ck < 2^^ , Lemma 15.61 below and condition {EL) imply 
logiBexpj- sup CiO,e^)\ 

r ^' 

< logiE;exp<^ 2A V sup Ck^{e' ,Tk-i{e')) 



< 



K p 

V2-'=log -Kexpl sup 2^Ck2\i{e',Tk-i{e'))] 



k=l 
K 



fc=i '-e'eDfe(e,e°) 

K 

< J2 2"'={logN(2-'=e, e, 6°) + 2vl\^). 

k=l 

These inequalities with the separability of Cl^:^") yield 
log^expi- sup C(^,f'')|= lim log^expj- sup Cl^,^" 

oo 

< 2-^{2vl\^ + logN(2-'^'e, e, 6>°)} < 2;.2^2 ^ q^^^ 



fe=i 



which completes the proof of the lemma. 

Now we are prepared to complete the proof of the theorem. Denote 



□ 



6>' = argmax{^(6>)iE:L(6>,6>o) +S['l(6»,6>o)}. 

ees(e,e°) 



It is clear that 



sup |MWi:(0,0o) + OT(0,0o)} 
< ix(e'^)L(e\e^) + ^(6\e^) + sup c(^, 

This yields by the Holder inequality and Lemma EI] with A = e/3/(l — p) that 
logiBexpl sup p[Ai(0)L(0,0o)+97t(e,0o)]| 

<logiBexp|p[/.(0«)L(0«,0o)+97l(0«,0o)] +P sup C(^,0»)} 

ees(e,e°) 

< plogiE;exp{/z(0«)L(0«, 0o) + 9K(^", ^o)} 

+ (l-p)logI?expf-^ sup C(^,0«)} 
^ P ees{e,6»°) 

< (l-p)Q(e,r) + (l-p)2i.2 



1-p 
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and the result follows. □ 
5.2. Proof of Theorem [Ql 

Theorem [32] implies a local bound for the process ii{6)L{6, do) +DJl{9, 6q) over 
any ball 'B{e, 6°) . To derive a global bound we apply the following general fact: 

Lemma 5.2. Let f{9) be a nonnegative function on d IBP and let for every 
point E a vicinity U{6) be fixed such that 6' G U{6) implies 6 £ U{8') . 
Let also a measure tt{U{6)) of the set U{6) fulfill for every 6° £ 

sup L.\,oA < (5.1) 



The 



with 



sup/(0)<^. / f*(e)^-^dn{e) 



f^e)"^' sup f{e'). 

e'eu{e) 

Proof. For every 9° E 

Je tt[U{9)) Ju[e°) t^[U{9)) 

because 9 e U{9°) implies 9° e U{9) and hence, f{9°) < f*{9). Now by 
(EH) 

Je T^[U{9)) V Ju{e-)'^\U{9 )) 

as required. □ 

We are going to apply Lemma W% with 

f{9) = cxp{p[^^{9)L{9, 9o) + sm{9, 9o)]}. 

In view of the definition of Tle{9° , 9q) — minegs(£,e°) 9o) it follows from 
the local bound of Theorem 15.11 that 



logJEcxpl sup p[fi{9)L{9,9o) + sm{9,9o)]} 

< -p{l - s)m,i9°,9o) + (1 - 9°) + ^^t^. 

1 -P 

and the theorem follows directly from Lemma [5721 
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5.3. Proof of Theorems [Ql 

Below by Cp we denote a generic constant (not necessarily the same) which 
only depends on the dimensionality p . First we show that the differentiability 
condition {ED) implies the local moment condition (EL) . 

Lemma 5.3. Assume that (ED) holds with some vq and A . Then for any 
6,6 Cz O and any A with |A| < A, 



logjEexp|2A ^^^'^l \ < 2v, 



2 \2 




A^ (5.2) 



Proof. For 0,6' € O , denote u = 6' 6 . With these notations 

L(6,6') = u'^ [ WL{6 + tu)dt. 
Jo 

Similar expressions hold for ]EL{6, 6') and for (^{6, 6') = L{6, 6') - 1EL{6, 6') : 

C{6,6')^vJ f \/(:{6 + tu)dt. 

The definition of &{6, 6') implies for any t E [0, 1] 

dof ^yu'^V{6 + tu)u 

' " e(6;W) - ' 

and therefore Lemma [5?6l and (|2.10[) with 7 — yield 



cit)4lS^±^dt 
y/-f^Vi6 + tu)-f 



< [ cit)logJEe.J2x4m±^]dt 
-Jo I ^j^V{6 + tu)^} 

as required. □ 

Due to the next lemma, the smoothness of the contrast implies that the topol- 
ogy induced by the metric 6(-, •) is locally equivalent to the Euchdean topology 
and computing the local entropy Q(e, •) can be reduced to the Euclidean case. 
Recall the notation 

S'(e,6>°) = {6/ : (6> - 6°YV{6°) {6 - 6°) < e^j. 
The definition of 'B{e,6) implies that ^(£,0°) C 'B'{e,6°) . 
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Lemma 5.4. Assume {ED) with some X, and let, for some fixed vi > 1 , 



• (EL) is fulfilled for X < X , i.e. iS.^) holds for all X < X . 

• supggg) Q(e, 0) < Qp + j)log(i'i), where Qp is the entropy of the unit ball 
in IRP in the Euclidean topology. 

Proof. The first claim is an immediate corollary of Lemma l5.3l Fix any 6° € & . 
Linear transformation with the matrix V~^{9°) reduces the situation to the 
case when V{6°) = I and 'B'{e,6°) is a usual Euclidean ball for any eo < e . 
Moreover, by ()5.3|) . each elliptic set !B'(eo,0) for 6 £ 'B{e,6°) is nearly an 
Euclidean ball in the sense that the ratio of its largest and smallest axes (which 
is the ratio of the largest and smallest eigenvalues of V~^{6°)V'^{6)V~^{6°) ) is 
bounded by vi . Therefore, for any eg < e , a Euclidean net D'^(eo/vi) with the 
step Cq/i^i ensures a covering of ']i{e,0°) by the sets ^(eo,^"), 6° G T)''{e). 
Therefore, the corresponding covering number is boimded by [vie/ei^Y yielding 
the claimed bound for the local entropy. □ 

Now we are ready to proceed with the proof of Theorem 12.81 We make use 
of the following technical result which helps to bound the global suprcmum of 
a random function over an integral of local maxima. 

Consider the ellipsoid 'B'{e,e°) ^ {9 : {6 - °)^V{e°) {6 - 0°) < e^} . Its 
Lebesgue measure fulfiUs n{'B'{e,0°)) = ujpeP / y/dct{V (6°)} where Up is the 
volume of the unit ball in . Condition (|2.12[) implies (|5.1[) with ly ~ ly^ for 
Tr{U{9)) = 7r(S'(e, 0)) and the Lebesgue measure tt . Now the result follows 
from Theorem 12.31 

5.4. Proof of Theorem [3721 

We start with some technical lemmas. 

Lemma 5.5. Suppose that for some r > , there are a positive matrix vq and 
a constant a,. > such that 



v{e)<vn, m{e,eo)>al{e-eo)^vo{e-eo), eeAi{r,eo) (5.4) 



e > 



(5.3) 



Then 



Then for any r] > 




det{nw(6>)}exp{-77nme(6>,6»o)}d0 < a'P {ujpeP + \7r/r)\P/'^) . 



Proof. The conditions of the lemma imply that for 6 E Ai{r, 9o) 



Vnme(6»,6/o) > [V^ar\\vy\e - 6/o)|| - e] 
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Changing the variable 6 hy u = (no^) ^^^^^^(^ — 9q) , yields in view of (|5.4p 
that 



(6,60) \Jdet{nv{e)}de 
< 1 f / du+ f exp{-77|i«f }dJ < a;P{cOpeP + 1^'?^/') 

O-r \J\\u\\<e JRP J 



as required. □ 

Next wc bound the part of the integral Sjt{p,s) over the complement of 
yii(r, Oq) . Namely, we aim to show that 

/ Jdet{nv{9)} exp\ -p{l - s)nm,{e,eo)}de < a(/3)e-''-("^ (5.5) 

Under dnUl), it obviuosly holds for 9 e 0\Ai{r,9o) that me{9,9o) > r-a:;^e/ji 
and 

p{l-s)nm,{9,9o) > Pm,{9,9o) + {p{l - s)n - /3}{r - a;h/n) 
> Pm,{9,9o) + br{n) + {p/2)\ogn 

and dSl]) follows by det{nz;(0)} = nP det{v (9)} . 

Lemma 15.51 with = p{l — s) , (|5.5p . and br{n) < imply 



To finalize the proof, we apply Theorem 13.11 with e defined by the equation 
log£!(/3,s) < {1- p)Qp + 2iy^p + 2p\og{iyi) 

(l-p)(l-s)|P/2 + (l-s)P/2 



log 1 



< cp+|iog(i(i-p)(i-.)ri) 

where C is a constant whose value depends on , vq^vi , and Cr (/?) . It is 
also used that Qp < Cp and logw""^ < Cp . 

5.5. Auxiliary facts 

Lemma 5.6. For any r.v. 's cind any nonnegative Xk such that A ~ Afc < 
1 

log E exp ( ^ XkCk) < 51 log -^e^-" . (5.6) 
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Proof. Convexity of and concavity of imply 

A 

< {jY. ^fc^ ""M^k - log Ee^' 



Lemma 5.7. Let be a nonnegative random variable and ip{X) ~ logiEcxp(A^). 
Then for any ?' > 

{m''f''< inf A-V(A). (5.7) 

A: <p{X)>r 

In particular, if (p{X) < a + cr^A^ for some a,(J > , then 

[JECf'' < 2CTv/max{a,r/2}. (5.8) 
Proof. Consider the following function 



log''(x) for X > e^, 
xr^ /e^ for x < . 



A simple algebra reveals that for x > 



fix) = rx-Hog^-\x), 

f"{x) = r{r - l)x-^ log^'-^x) - rx'^ log'^-^(x) 
= ra:~2[r - 1 - log(a;)] log'""^(x) < 0. 



Since the function f{x) is linear for a; < e'' , it is concave for all .t > . It 
is also easy to check that [log(x)]!^ < f{x) , because for x < , the function 
/(x) coincides with the tangent of log''(x) at x = . Therefore, 

x' = A-''log''(e^^) < A-V(e^^) 

and the Jensen inequality implies for any A > 

EC < \-''Ef{e^^) < X-\f{Ee^^) = X'"' f{ef^^^) . (5.9) 



If ^(A) > r , then f{evW) = log'' (e'^(^)) = ^''(A) and jET]) follows from (jOl) . 

To prove (|5.8p . it remains to notice that the monotonicity of /(•) implies, in 
view of (jO)) . that 



(ECy^^ < inf h + a^x\ = 



(Tr(r - a)"^/^, a < r/2 
2(7^, a > r/2 



, J 2a J r 12, a < r/2 . „ , f 77-;- 

^ ^ 2aV^' a>r/2 < 2- Vnmx{a, r/2}. 
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Lemma 5.8. Let a r.v. ^ fulfill = , iB^^ = 1 and l?cxp(Ai|^|) = h < oo 
for some Ai > . Then for any p < 1 there is a constant C'l depending on x , 
Ai and p only such that for A < pXi 

logJBc^« < 

Moreover, there is a constant A2 > such that for all A < A2 

logiBe^« > p\^l2. 

Proof. Define h{x) = (A — Ai)a; + mlog(a;) for m > and A < Ai . It is easy 
to see by a simple algebra that 

max nix) = — m + m log -. 

2;>0 Ai - A 

Therefore for any a; > 

m 



\x + m log(.T) < Ai.T + log 
This implies for all A < Ai 



e(Ai - A) 



iB|er"exp(A|e|) < (^^p^T^ ) ^cxp(Aiiei). 

Suppose now that for some Ai > , it holds exp(Ai |^|) = >({\i) < 00 . Then 
the function /io(A) = JEexp(AO fulfills ho{0) = 1 , /io(0) = = , h'^!,{0) = 1 
and for A < Ai , 

This implies by the Taylor expansion for A < pXi that 

ho{X) < I + C1AV2 
with Ci = x{Xi)/{Xj{l - p)2} , and hence, log/io(A) < CiX^/2. 
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